Tree-Structured Classifier
نویسنده
چکیده
A tree-structured classifier is a decision tree for predicting a class variable from one or more predictor variables. THAID [15, 7] was the first such algorithm. This article focuses on the CART R © [2], C4.5 [17], and GUIDE [12] methods. The algorithms are briefly reviewed and their similarities and differences compared on a real data set and by simulation. In a typical classification problem, we have a training sample L = {(X1, Y1), (X2, Y2), . . . , (XN , YN )} of N observations, where each X = (X1, . . . , XK) is a K-dimensional vector of predictor variables and Y is a class variable that takes one of J values. We want to construct a rule for predicting the Y value of a new observation given its value of X. If the predictor variables are all ordered, i.e., non-categorical, some popular classifiers are linear discriminant analysis (LDA), nearest neighbor, and support vector machines. (Categorical predictor variables can be accommodated by transformation to vectors of 0-1 dummy variables.) Although these classifiers often possess good prediction accuracy, they act like black boxes and do not provide much insight into the roles of the predictor variables. A tree-structured classifier (or classification tree) is an attractive alternative because it is easy to interpret. It is a decision tree obtained by recursive partitioning of the X-space. An observation in a partition is predicted to belong to the class with minimum estimated misclassification cost. Classification trees have been demonstrated to possess high prediction accuracy compared to many other methods; see, e.g., Lim et al. [11], Perlich et al. [16], and Loh [12]. They do not require categorical predictor variables to be transformed. THAID [15, 7] is the first published algorithm. We review here the CART R © [2], C4.5 [17], and GUIDE [12] algorithms and illustrate their similarities and differences on a real data set and by simulation.
منابع مشابه
Tree Kernel Usage in Naive Bayes Classifiers
We present a novel approach in machine learning by combining naı̈ve Bayes classifiers with tree kernels. Tree kernel methods produce promising results in machine learning tasks containing treestructured attribute values. These kernel methods are used to compare two tree-structured attribute values recursively. Up to now tree kernels are only used in kernel machines like Support Vector Machines o...
متن کاملVoltage Sag Compensation with DVR in Power Distribution System Based on Improved Cuckoo Search Tree-Fuzzy Rule Based Classifier Algorithm
A new technique presents to improve the performance of dynamic voltage restorer (DVR) for voltage sag mitigation. This control scheme is based on cuckoo search algorithm with tree fuzzy rule based classifier (CSA-TFRC). CSA is used for optimizing the output of TFRC so the classification output of the network is enhanced. While, the combination of cuckoo search algorithm, fuzzy and decision tree...
متن کاملReport of the symposium "Lernen, Wissen, Adaptivität 2011" of the GI special interest groups KDML, IR and WM, LWA 2011, Magdeburg, 28.-30.September 2011
We present a novel approach in machine learning by combining naı̈ve Bayes classifiers with tree kernels. Tree kernel methods produce promising results in machine learning tasks containing treestructured attribute values. These kernel methods are used to compare two tree-structured attribute values recursively. Up to now tree kernels are only used in kernel machines like Support Vector Machines o...
متن کاملA New Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes Classifier for Coping with Gene Ontology-based Features
The Tree Augmented Naı̈ve Bayes classifier is a type of probabilistic graphical model that can represent some feature dependencies. In this work, we propose a Hierarchical Redundancy Eliminated Tree Augmented Naı̈ve Bayes (HRE–TAN) algorithm, which considers removing the hierarchical redundancy during the classifier learning process, when coping with data containing hierarchically structured feat...
متن کاملA Comparative Study of Classification Based Personal E-mail Filtering
This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, Email messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To p...
متن کاملMultiple Classifier Boosting and Tree-Structured Classifiers
Visual recognition problems often involve classification of myriads of pixels, across scales, to locate objects of interest in an image or to segment images according to object classes. The requirement for high speed and accuracy makes the problems very challenging and has motivated studies on efficient classification algorithms. A novel multi-classifier boosting algorithm is proposed to tackle...
متن کامل